Improving mod_perl Sites' Performance: Part 7
Correct configuration of the MinSpareServers
, MaxSpareServers
, StartServers
, MaxClients
, and MaxRequestsPerChild
parameters is very important. There are no defaults. If they are too low, then you will underutilize the system’s capabilities. If they are too high, then chances are that the server will bring the machine to its knees.
All the above parameters should be specified on the basis of the resources you have. With a plain Apache server, it’s no big deal if you run many servers since the processes are about 1Mb and don’t eat a lot of your RAM. Generally, the numbers are even smaller with memory sharing. The situation is different with mod_perl. I have seen mod_perl processes of 20Mb and more. Now, if you have MaxClients
set to 50, then 50x20Mb = 1Gb. Maybe you don’t have 1Gb of RAM - so how do you tune the parameters? Generally, by trying different combinations and benchmarking the server. Again, mod_perl processes can be made much smaller when memory is shared.
Before you start this task, you should be armed with the proper weapon. You need the crashme utility, which will load your server with the mod_perl scripts you possess. You need it to have the ability to emulate a multiuser environment and to emulate the behavior of multiple clients calling the mod_perl scripts on your server simultaneously. While there are commercial solutions, you can get away with free ones that do the same job. You can use the ApacheBench utility that comes with the Apache distribution, the crashme
script which uses LWP::Parallel::UserAgent
, httperf or http_load all discussed in one of the previous articles.
It is important to make sure that you run the load generator (the client which generates the test requests) on a system that is more powerful than the system being tested. After all, we are trying to simulate Internet users, where many users are trying to reach your service at once. Since the number of concurrent users can be quite large, your testing machine must be very powerful and capable of generating a heavy load. Of course, you should not run the clients and the server on the same machine. If you do, then your test results would be invalid. Clients will eat CPU and memory that should be dedicated to the server, and vice versa.
Configuration Tuning with ApacheBench
I’m going to use the ApacheBench
(ab
) utility to tune our server’s configuration. We will simulate 10 users concurrently requesting a very light script at http://www.example.com/perl/access/access.cgi
. Each simulated user makes 10 requests.
% ./ab -n 100 -c 10 http://www.example.com/perl/access/access.cgi
The results are:
Document Path: /perl/access/access.cgi
Document Length: 16 bytes
Concurrency Level: 10
Time taken for tests: 1.683 seconds
Complete requests: 100
Failed requests: 0
Total transferred: 16100 bytes
HTML transferred: 1600 bytes
Requests per second: 59.42
Transfer rate: 9.57 kb/s received
Connnection Times (ms)
min avg max
Connect: 0 29 101
Processing: 77 124 1259
Total: 77 153 1360
The only numbers we really care about are:
Complete requests: 100
Failed requests: 0
Requests per second: 59.42
Let’s raise the request load to 100 x 10 (10 users, each making 100 requests):
% ./ab -n 1000 -c 10 http://www.example.com/perl/access/access.cgi
Concurrency Level: 10
Complete requests: 1000
Failed requests: 0
Requests per second: 139.76
As expected, nothing changes – we have the same 10 concurrent users. Now let’s raise the number of concurrent users to 50:
% ./ab -n 1000 -c 50 http://www.example.com/perl/access/access.cgi
Complete requests: 1000
Failed requests: 0
Requests per second: 133.01
We see that the server is capable of serving 50 concurrent users at 133 requests per second! Let’s find the upper limit. Using -n 10000 -c 1000
failed to get results (Broken Pipe?). Using -n 10000 -c 500
resulted in 94.82 requests per second. The server’s performance went down with the high load.
The above tests were performed with the following configuration:
MinSpareServers 8
MaxSpareServers 6
StartServers 10
MaxClients 50
MaxRequestsPerChild 1500
Now let’s kill each child after it serves a single request. We will use the following configuration:
MinSpareServers 8
MaxSpareServers 6
StartServers 10
MaxClients 100
MaxRequestsPerChild 1
Simulate 50 users each generating a total of 20 requests:
% ./ab -n 1000 -c 50 http://www.example.com/perl/access/access.cgi
The benchmark timed out with the above configuration. I watched the output of ps
as I ran it, the parent process just wasn’t capable of respawning the killed children at that rate. When I raised the MaxRequestsPerChild
to 10, I got 8.34 requests per second. Very bad - 18 times slower! You can’t benchmark the importance of the MinSpareServers
, MaxSpareServers
and StartServers
with this type of test.
Now let’s reset MaxRequestsPerChild
to 1500, but reduce MaxClients
to 10 and run the same test:
MinSpareServers 8
MaxSpareServers 6
StartServers 10
MaxClients 10
MaxRequestsPerChild 1500
I got 27.12 requests per second, which is better but still four to five times slower. (I got 133 with MaxClients
set to 50.)
Summary: I have tested a few combinations of the server configuration variables (MinSpareServers
, MaxSpareServers
, StartServers
, MaxClients
and MaxRequestsPerChild
). The results I got are as follows:
MinSpareServers
, MaxSpareServers
and StartServers
are only important for user response times. Sometimes users will have to wait a bit.
The important parameters are MaxClients
and MaxRequestsPerChild
. MaxClients
should be not too big, so it will not abuse your machine’s memory resources, and not too small, for if it is, your users will be forced to wait for the children to become free to serve them. MaxRequestsPerChild
should be as large as possible, to get the full benefit of mod_perl, but watch your server at the beginning to make sure your scripts are not leaking memory, thereby causing your server (and your service) to die very fast.
Also, it is important to understand that we didn’t test the response times in the tests above, but the ability of the server to respond under a heavy load of requests. If the test script was heavier, then the numbers would be different but the conclusions similar.
The benchmarks were run with:
- HW: RS6000, 1Gb RAM
- SW: AIX 4.1.5 . mod_perl 1.16, apache 1.3.3
- Machine running only mysql, httpd docs and mod_perl servers.
- Machine was _completely_ unloaded during the benchmarking.
After each server restart when I changed the server’s configuration, I made sure that the scripts were preloaded by fetching a script at least once for every child.
It is important to notice that none of the requests timed out, even if it was kept in the server’s queue for more than a minute! That is the way ab works, which is OK for testing purposes but will be unacceptable in the real world - users will not wait for more than five to 10 seconds for a request to complete, and the client (i.e. the browser) will time out in a few minutes.
Now let’s take a look at some real code whose execution time is more than a few milliseconds. We will do some real testing and collect the data into tables for easier viewing.
I will use the following abbreviations:
NR = Total Number of Request
NC = Concurrency
MC = MaxClients
MRPC = MaxRequestsPerChild
RPS = Requests per second
Running a mod_perl script with lots of mysql queries (the script under test is mysqld limited) (http://www.example.com/perl/access/access.cgi?do_sub=query_form), with the configuration:
MinSpareServers 8
MaxSpareServers 16
StartServers 10
MaxClients 50
MaxRequestsPerChild 5000
gives us:
NR NC RPS comment
------------------------------------------------
10 10 3.33 # not a reliable figure
100 10 3.94
1000 10 4.62
1000 50 4.09
Conclusions: Here I wanted to show that when the application is slow (not due to perl loading, code compilation and execution, but limited by some external operation) it almost does not matter what load we place on the server. The RPS (Requests per second) is almost the same. Given that all the requests have been served, you have the ability to queue the clients, but be aware that anything that goes into the queue means a waiting client and a client (browser) that might time out!
Now we will benchmark the same script without using the mysql (code limited by perl only): (http://www.example.com/perl/access/access.cgi), it’s the same script but it just returns the HTML form, without making SQL queries.
MinSpareServers 8
MaxSpareServers 16
StartServers 10
MaxClients 50
MaxRequestsPerChild 5000
NR NC RPS comment
------------------------------------------------
10 10 26.95 # not a reliable figure
100 10 30.88
1000 10 29.31
1000 50 28.01
1000 100 29.74
10000 200 24.92
100000 400 24.95
Conclusions: This time the script we executed was pure perl (not limited by I/O or mysql), so we see that the server serves the requests much faster. You can see the number of requests per second is almost the same for any load, but goes lower when the number of concurrent clients goes beyond MaxClients
. With 25 RPS, the machine simulating a load of 400 concurrent clients will be served in 16 seconds. To be more realistic, assuming a maximum of 100 concurrent clients and 30 requests per second, the client will be served in 3.5 seconds. Pretty good for a highly loaded server.
Now we will use the server to its full capacity, by keeping all MaxClients
clients alive all the time and having a big MaxRequestsPerChild
, so that no child will be killed during the benchmarking.
MinSpareServers 50
MaxSpareServers 50
StartServers 50
MaxClients 50
MaxRequestsPerChild 5000
NR NC RPS comment
------------------------------------------------
100 10 32.05
1000 10 33.14
1000 50 33.17
1000 100 31.72
10000 200 31.60
Conclusion: In this scenario, there is no overhead involving the parent server loading new children, all the servers are available, and the only bottleneck is contention for the CPU.
Now we will change MaxClients
and watch the results: Let’s reduce MaxClients
to 10.
MinSpareServers 8
MaxSpareServers 10
StartServers 10
MaxClients 10
MaxRequestsPerChild 5000
NR NC RPS comment
------------------------------------------------
10 10 23.87 # not a reliable figure
100 10 32.64
1000 10 32.82
1000 50 30.43
1000 100 25.68
1000 500 26.95
2000 500 32.53
Conclusions: Very little difference! Ten servers were able to serve almost with the same throughput as 50. Why? My guess is because of CPU throttling. It seems that 10 servers were serving requests five times faster than when we worked with 50 servers. In that case, each child received its CPU time slice five times less frequently. So having a big value for MaxClients
, doesn’t mean that the performance will be better. You have just seen the numbers!
Now we will start drastically to reduce MaxRequestsPerChild
:
MinSpareServers 8
MaxSpareServers 16
StartServers 10
MaxClients 50
NR NC MRPC RPS comment
------------------------------------------------
100 10 10 5.77
100 10 5 3.32
1000 50 20 8.92
1000 50 10 5.47
1000 50 5 2.83
1000 100 10 6.51
Conclusions: When we drastically reduce MaxRequestsPerChild
, the performance starts to become closer to plain mod_cgi.
Here are the numbers of this run with mod_cgi, for comparison:
MinSpareServers 8
MaxSpareServers 16
StartServers 10
MaxClients 50
NR NC RPS comment
------------------------------------------------
100 10 1.12
1000 50 1.14
1000 100 1.13
Conclusion: mod_cgi is much slower. :) In the first test, when NR/NC was 100⁄10, mod_cgi was capable of 1.12 requests per second. In the same circumstances, mod_perl was capable of 32 requests per second, nearly 30 times faster! In the first test, each client waited about 100 seconds to be served. In the second and third tests, they waited 1,000 seconds!
Choosing MaxClients
The MaxClients
directive sets the limit on the number of simultaneous requests that can be supported. No more than this number of child server processes will be created. To configure more than 256 clients, you must edit the HARD_SERVER_LIMIT
entry in httpd.h
and recompile. In our case, we want this variable to be as small as possible, so we can limit the resources used by the server children. Since we can restrict each child’s process size with Apache::SizeLimit
or Apache::GTopLimit
, the calculation of MaxClients
is pretty straightforward:
Total RAM Dedicated to the Webserver
MaxClients = ------------------------------------
MAX child's process size
So if I have 400Mb left for the Web server to run with, then I can set MaxClients
to be of 40 if I know that each child is limited to 10Mb of memory (e.g. with Apache::SizeLimit
).
You will be wondering what will happen to your server if there are more concurrent users than MaxClients
at any time. This situation is signified by the following warning message in the error_log
:
[Sun Jan 24 12:05:32 1999] [error] server reached MaxClients setting,
consider raising the MaxClients setting
There is no problem – any connection attempts over the MaxClients
limit will normally be queued, up to a number based on the ListenBacklog
directive. When a child process is freed at the end of a different request, the connection will be served.
It is an error because clients are being put in the queue rather than getting served immediately, despite the fact that they do not get an error response. The error can be allowed to persist to balance available system resources and response time, but sooner or later you will need to get more RAM so you can start more child processes. The best approach is to try not to have this condition reached at all, and if you reach it often you should start to worry about it.
It’s important to understand how much real memory a child occupies. Your children can share memory between them when the OS supports that. You must take action to allow the sharing to happen. We have disscussed this in one of the previous article whose main topic was shared memory. If you do this, then chances are that your MaxClients
can be even higher. But it seems that it’s not so simple to calculate the absolute number. If you come up with a solution, then please let us know! If the shared memory was of the same size throughout the child’s life, then we could derive a much better formula:
Total_RAM + Shared_RAM_per_Child * (MaxClients - 1)
MaxClients = ---------------------------------------------------
Max_Process_Size
which is:
Total_RAM - Shared_RAM_per_Child
MaxClients = ---------------------------------------
Max_Process_Size - Shared_RAM_per_Child
Let’s roll some calculations:
Total_RAM = 500Mb
Max_Process_Size = 10Mb
Shared_RAM_per_Child = 4Mb
500 - 4
MaxClients = --------- = 82
10 - 4
With no sharing in place
500
MaxClients = --------- = 50
10
With sharing in place you can have 64 percent more servers without buying more RAM.
If you improve sharing and keep the sharing level, let’s say:
Total_RAM = 500Mb
Max_Process_Size = 10Mb
Shared_RAM_per_Child = 8Mb
500 - 8
MaxClients = --------- = 246
10 - 8
392 percent more servers! Now you can feel the importance of having as much shared memory as possible.
Choosing MaxRequestsPerChild
The MaxRequestsPerChild
directive sets the limit on the number of requests that an individual child server process will handle. After MaxRequestsPerChild
requests, the child process will die. If MaxRequestsPerChild
is 0, then the process will live forever.
Setting MaxRequestsPerChild
to a non-zero limit solves some memory leakage problems caused by sloppy programming practices, whereas a child process consumes more memory after each request.
If left unbounded, then after a certain number of requests the children will use up all the available memory and leave the server to die from memory starvation. Note that sometimes standard system libraries leak memory too, especially on OSes with bad memory management (e.g. Solaris 2.5 on x86 arch).
If this is your case, then you can set MaxRequestsPerChild
to a small number. This will allow the system to reclaim the memory that a greedy child process consumed, when it exits after MaxRequestsPerChild
requests.
But beware – if you set this number too low, you will lose some of the speed bonus you get from mod_perl. Consider using Apache::PerlRun
if this is the case.
Another approach is to use the Apache::SizeLimit
or the Apache::GTopLimit
modules. By using either of these modules you should be able to discontinue using the MaxRequestPerChild
, although for some developers, using both in combination does the job. In addition the latter module allows you to kill any servers whose shared memory size drops below a specified limit.
References
- The mod_perl site’s URL: http://perl.apache.org/
- Apache::GTopLimit
Tags
Feedback
Something wrong with this article? Help us out by opening an issue or pull request on GitHub